Goto

Collaborating Authors

 on-line learning


On-line Learning of Dichotomies

Neural Information Processing Systems

The performance of on-line algorithms for learning dichotomies is studied. In on-line learn(cid:173) ing, the number of examples P is equivalent to the learning time, since each example is presented only once. The learning curve, or generalization error as a function of P, depends on the schedule at which the learning rate is lowered. For a target that is a perceptron rule, the learning curve of the perceptron algorithm can decrease as fast as p- 1, if the sched(cid:173) ule is optimized. If the target is not realizable by a perceptron, the perceptron algorithm does not generally converge to the solution with lowest generalization error.


Adaptive Back-Propagation in On-Line Learning of Multilayer Networks

Neural Information Processing Systems

An adaptive back-propagation algorithm is studied and compared with gradient descent (standard back-propagation) for on-line learning in two-layer neural networks with an arbitrary number of hidden units. Within a statistical mechanics framework, both numerical studies and a rigorous analysis show that the adaptive back-propagation method results in faster training by breaking the symmetry between hidden units more efficiently and by providing faster convergence to optimal generalization than gradient descent.


On-line Learning from Finite Training Sets in Nonlinear Networks

Neural Information Processing Systems

Online learning is one of the most common forms of neural net(cid:173) work training. We present an analysis of online learning from finite training sets for non-linear networks (namely, soft-committee ma(cid:173) chines), advancing the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.


On-Line Learning with Restricted Training Sets: Exact Solution as Benchmark for General Theories

Neural Information Processing Systems

O(ws(s log d log(dqh/ s))) and O(ws((h/ s) log q) log(dqh/ s)) are upper bounds for the VC-dimension of a set of neural networks of units with piecewise polynomial activation functions, where s is the depth of the network, h is the number of hidden units, w is the number of adjustable parameters, q is the maximum of the number of polynomial segments of the activation function, and d is the maximum degree of the polynomials; also n(wslog(dqh/s)) is a lower bound for the VC-dimension of such a network set, which are tight for the cases s 8(h) and s is constant. For the special case q 1, the VC-dimension is 8(ws log d).